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ABSTRACT 

The PRODIGY/EBL system [Minton88] was one of the 
first works to directly attack the problem of strategy utility. The 
problem of finding effective strategies was reduced to the problem of 
finding effective rules. However, this paper illustrates limitations 
of the approach. There are two basic difficulties. The first arises 
from the fact that the utility of a control rule cannot be accurately 
determined from a single instance of the rule. This is a 
manifestation of a more basic problem which we term the utility 
generalization problem. The difficulty is that generalization 
techniques employed by speed-up learning systems are accuracy 
preserving but not utility preserving. The second difficulty is that 
control rules interact such that the utility of one control rule is a 
function of the other control rules in the system. This composabil ity 
problem means that systems cannot reduce the problem of learning 
effective strategies to the problem of identifying rule utility in 
isolation. We document the seriousness of these problems with an 
example domain theory. With this theory, PRODIGY/EBL generates 
control strategies which are up to 17 times slower than the original 
planner. While this raises serious questions 'about the effectiveness 
of PRODIGY/EBL, we also claim that the utility generalization and 
composabil ity problems are basic issues which are not adequately 
addressed by current speed-up learning techniques. We introduce an 
alternative technique called COMPOSER. This system is based on a 
sound statistical model which is validated with a series of 
experiments. COMPOSER successfully avoids the utility generalization 
and composabil ity problems. (Contains 33 references.) (Author/ALF) 
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Abstract 

The PRODIGY/EBL system [Minton88] was one of the first works to directly attack the problem of 
strategy utility. The problem of finding effective strategies was reduced to the problem of finding 
effective rules. However, this paper illustrates limitations of the approach. There are two basic diffi- 
culties. The first arises from the fact that the utility of a control rule cannot be accurately determined 
from a single instance of the rule. This is a manifestation of a more basic problem which we term 
the utility generalization problem. The difficulty is that the generalization techniques employed by 
speed-up learning systems are accuracy preserving but not utility preserving. The second difficulty 
is that control rules interact such that the utility of one control rule is a function of the other control 
rules in the system. This composability problem means that systems cannot reduce the problem of 
learning effective strategies to the problem of identifying rule utility in isolation. We document the 
seriousness of these problems with an example domain theory. With this theory, PRODIGY/EBL 
generates control strategies which are up to seventeen times slower than the original planner. While 
this raises serious questions about the effectiveness of PRODIGY/EBL, we also claim the the utility 
generalization and composability problems are basic issues which are not adequately addressed by 
current speed-up learning techniques. We introduce an alternative technique called COMPOSER. 
This system is based on a sound statistical model which is validated with a series of experiments. 
COMPOSER successfully avoids the utility generalization and composal iiity problems. 



1 INTRODUCTION 



There is considerable research in machme learning into tecliniques to improve problem solving abil- 
ity. Unfortunately, "speed-up learning" systems can result in substantial performance degradation 
[Et2ioni90a, Minton85, Mooney89, Subramanian90, Tambe89]. Additionally, empirical claims of 
success are frequently shown to be sensitive to subtle changes to the experimental conditions 
[Gratch90, Mooney89, Segre91, Subramanian90]. It is not surprising that a basic question donrii- 
nates research in this area: what is the value of knowledge? 

There are two major approaches to identifying "good" taiowledge. Tne first places syntactic restric- 
tions on the learning mechanism such that it only generates beneficial knowledge. Researchers try 
to identify a set of domain independent syntactic constraints to discriminate helpful from harmful 
knowledge. Learning systems can then be designed to obey these constraints. We will use the term 
operationality criteria [Mitcheli86] to refer to any set of domain mdependent syntactic constraints 
which limit the generation of knowledge. Many criteria have been proposed [Etzioni90a, Letov- 
sky90, Segre87, Subramanian90]. 

A second approach is to compute a numeric estimate of the value of knowledge. This estimate is 
then used to discard harmful knowledge. The learning system implements a cost model and esti- 
mates parameters of this model through direct observation of problem solving behavior within a par- 
ticular domain [Gratch91, Keller87, Leckie91, Mmton88, Yoo91]. We use the term utility analysis 
for techniques which directly estimate the value of knowledge. The two approaches complement 
each other. Utility analysis allows an inexact operationality criteria. An accurate operationality cri- 
teria reduces the burden for utility analysis. 

6300 
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Figure 1: PRODIGY/EBL learning curve illustrating a harmful control strategy. 
Results are averaged over ten trials. 

In this paper we will relate an in-depth investigation of one approach to utilit>' analysis: the utility 
analysis method of PRODIGY/EBL [Minton88] This is an approach which has shown empirical suc- 
cess on several domains. Unfortunately, this success is not guaranteed. Figure 1 illustrates a learning 
curve for PRODIGY/EBL on an artificial domain (described below). The learned strategy actually 
degrades performance by an order of magnitude. As we will show, there are many issues not ade- 
quately addressed by the PRODIGY/EBL method. We will then aigue that these are fundamental 
problems, and are not adequately addressed by current approaches to utility analysis or operational- 
ity. 
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2 REVIEW OF PRODIGY/EBL 



PRODIGY/EBL [Minton88] is a learning approacli which enhances the effectiveness of an underly- 
ing STRIPS-like planner. Tl?e system uses explanation-based learning (EBL) [DeJong86, Mitch- 
ell86] to produce control rules from traces of problem solving behavior. Control rules are condition- 
action statements which alter the way the PRODIGY planner explores its problem spaces. By 
default, the planner lists all operators which unify with an unachieved goal and explores these alter- 
natives depth-first. Control rules change the search by discarding or reordering some alternatives. 
Figure 2 illustrates a control rule learned by PRODIGY/EBL on the blocksworld domain. The block- 
sworld has several operators for clearing a block. P.ULE-l asserts that in situations with an unheld 
block, only consider the UNSTACK operator. 

RULE-i: IF current-node is ?n 

current-goal at ?n is (CLEAR ?x) 
(NOT (HOLDING ?x)) is true at ?n 
THEN choose operator UNSTACK 

Figure 2: An example control rule 
2.1 Utility Analysis 

Not all control rules increase ;he efCiciency of planning. Control rules avoid search in the problem 
space, however they uitroduCe ihc cost of matching their preconditions. A rule is harmful when the 
precondition evaluation cost exceeds the savings. PRODIGY/EBL incorporates utility analysis to 
avoid this situation. Minton proposes a cost model to captures the tradeoff between a control rule's 
savings and precondition match cost. The model associates a utility value with each control rule: 

UTILITY(rule) - Averagers avings(rule) x Success„rate(rule) - Match_cost(rule) (la) 

The utility of a control rule is the difference between the savings it produces (attenuated by the per- 
cent of time its preconditions are satisfied) and its precondition match cost. Savings, Success-rate, 
and Match-cost are parameters of the model which the system must estimate. Unfortunately, it is 
difficult to measure AverageiSavings directly. To do so would require exploring the portions of the 
problem space which the rule avoids, nullifying the effect of the rule. To avoid this difficulty, PROD- 
IGY/EBL is implemented with a simplified cost model: 

UTILITYpERCEiVED(r) - Initial_savings(r) x Success_rate(r) - Match_cost(r) (lb) 

This model derives Average-Savings from the savings which results on the instance from which the 
control rule was learned. Success-Rate and Match-cost are directly measured from subsequent 
problem solving experience. Minton assumes that perceived utility (Equation lb) will be a close 
approximation to the true utility (Equation la). 

2.1 Defining "Cost" 

PRODIGY/EBL is based on an average cost model of utility. That is, the model considers a control 
strategy effective if it reduces average problem solving cost. This model does not entail that the cost 
of any particular problem will be reduced. Rather, the cost to solve any representative sample of 
problems will be less. The average cost model is ubiquitous in the speed-up learning community 
and we will not discuss its merits in this paper. One should be aware, however, that alternatives do 
exist. 

Speed-up learning systems reduce the cost of problem solving. Therefore, it is paramount to define 
"cost" precisely. Many criteria are in use. One possibility is to emphasize solution quality; either 
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by guaranteeing optimality [Mostow89] or by defining cost metrics which prefer quality solutions 
[Eskey90]. Accuracy is another important dimension. Namely, what is the ratio of solvable to un- 
solvable problems. PRODIGY/EBL defines cost by CPU seconds required to solve problems. The 
system actually measures the time required to perfonn certain processes, and tries to reduce total 
problem solving time. 

The choice of a cost criteria can have dramatic impact on system behavior. This issue is explored 
in detail in [Segre91]. We will briefly illustrate the difficulties in the context of PRODIGY/EBL. 
PRODIGY/EBL tries to reduce problem solving cost. This is easily accomplished by a single control 
rule which immediately fails to solve a problem. But this "fast" strategy reduces the accuracy of 
the problem solver to zero. Instead, the mle generator constrains control rules to be "truth preserv- 
ing" in the sense that they only eliminate provably irrelevant portions of the search space. Thus, 
presumably, if a problem is solvable, it carmot become unsolvable with learning. 

Unfortunately, diere is a further complication. Problem solving is combinatorially expensive. Plan- 
ners, like PRODIGY, impose resource limitations on their problem solving. This makes problem 
solving tractable at the expense of accuracy. The planner simply aborts problem solving when it 
reaches the resource limit. It might appear that PRODIGY/EBL can enhance enhance accuracy by 
simply minimizing problem solving cost. This happens when a problem which is to expensive to 
solve becomes solvable with the learned strategy. But a learned control strategy can also reduce ac- 
curacy. This is a legacy of the average cost model of utility. By reducing average cost, a strategy 
can increase the cost of certam problems. The technique reduces accuracy if the solutions to these 
problems require more resources that the limit allows. 

Finally, there is an issue of how to account for the resources expended during training. The most 
popular approach is to assume learning cost can be amortized over a large body to test problems. 
Minton adopts this approach and learning cost does not participate in his performance data. There 
are some alternative approaches. The training phase can be shown to be tractable (i.e., polynomial) 
[Natarajan89, Tadepalli91 ]. Another possibility is to mclude training time in the cost models [Yama- 
da91]. 

2.3 Assumptions 

In this paper we will preserve several of the assumptions embodied in PRODIGY/EBL. Therefore, 
we assume our goal is to mcrease the efficiency of satisficing search [Simon75]. In this situation 
the problem solver may search for any valid solution. Therefore, as in PRODIGY/EBL, solution qual- 
ity is not an issue. We will also discount traming cost, assuming it can be amortized over future prob- 
lem solving. We make one additional assumption to avoid tradeoffs between efficiency and accura- 
cy. For the remainder of this paper we assume that all problems are solvable within the resource 
bounds of the PRODIGY planner. Together with the assumption the PRODIGY/EBL generates truth 
preserving rules, this insures that reducing CPU cost does not affect problem solving accuracy. 

3 CRITIQUE OF PRODIGY/EBL: Single-rule strategies 

For the moment we will ignore how mles combine and consider the reduced problem of fmding an 
effective single-rule control strategy. The system may learn a control rule if the planner finds no 
solution in a large subtree of the problem space. PRODIGY/EBL analyzes such instances of wasted 
effort and proposes control rules to avoid the situation. What we will show is that is that PRODIGY/ 
EBL cannot accurately determine rule utility. As will become apparent, we refer to this as the utility 
generalization problem. 
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3A Savings Variance 

When PRODIGY/EBL learns a control rule, it applies to a particular planning context. This context 
is defined by a world state and a set of unsatisfied goals. PRODIGY/EBL uses analytic techniques 
to generalize the rule, ignoring aspects of the context which did not participate in the failure. The 
resulting rule can then apply to a large set of planning situations and we are guaranteed that in each 
instance, the rule avoids firuitless alternatives^ While the generalization preserves correctness, ii 
does not guarantee that the savings observed in the training instance will reflect the savings every- 
where the rule applies. If savings varies too much, it is unlikely that the initial observation of rule 
savings will reflect the average. This violates the assumption that perceived utility approximates 
true utility. 

In practice, the savings induced by a mle is highly dependent on information dropped by generaliza- 
tion. To illustrate this, consider Figure 3 which displays a portion of a search space for the blocks- 
world domain. Boxes contain the current goals at a node. Operators connect boxes. The goal is 
to clear block B. There are three operators which achieve this effect: UNSTACK, PUTDOWN, and 
STACK. Each operator may apply to multiple blocks. This results in ten alternative paths for satisfy- 
ing the goal. Assume that we explore the space from left to right and from top to bottom. If the rule 
in Figure 2 is available to the planner, it elimmates the two altematives using the PUTDOWN opera- 
tor and the four altematives using the STACK operator (six of the ten choices). 



GOAL: 




?inrDOWN(A) 

PtnDOWN(B) 
STACK<A,A) 
STACK(A3) 
STACK(B»A) 
STACK(B,B) 
UNSTACK(A»i*c^ 
UNSTACK(A»Bt 
UNSTACK(B»i*!:) 
UNSTACK(B,B^ 




Search space avoided 
by the application of 
RULE~1 

INITLVL STATE: 





A 







Figure 3: An example search space for the blocksworld domain 

Next, consider addmg an irrelevant block to the table. This creates more ways to mstantiate PUT- 
DOWN and UNSTACK (three for PUTDOWN and eight for UNSTACK), each of which is trimmed 
by RULE-1. In general, this rule saves n + 11(11-!) altematives where n is the number of blocks in 
the current state. The lesson is that savings provided by a rule on its generalized set of instances may 
vary greatly [Gratch91]. 

3.1 Quantifying the Effects of Variance 

We can borrow notions from statistics tounc' ^rstand how the variance in savings effects utility analy- 
sis. We can view the savings that a co ntrol r ule provides as a random variable (SAV). Rule savings 
can then be described by its average (SAV), and a probabiUty density function (p.d.f„). Ttie bell- 
shaped curves m Figure 4 are examples of p.d.f.'s. The horizontal axis describes legal values for 
the random variable. The vertical axis represents probability. The probability that an instance of 
the variable will lie within a specified range is the integral of the p.d.f over that range. Average sav- 
ings must be attenuated by success rate before it can be compared with the match cost. To simplify 

1. This guarantee only holds for control rules which eliminate altematives, PRODIGY/EBL can also leam rules 
which re-c«:der altematives. These "preference rules" are only heuristics. 
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tlie discussion, we define another random variable S SAVxSuccess_rate(r). S - SAVxSuc- 
cess_rate(r). 

We can use these properties to discuss the likelihood that the utilit>^ of a single-rule strategy will be 
mis-represented. There are two ways in which the utility analysis can eir. Either the system can 
keep a hannful rule (false positive) or the system can discard a helpful rule (false negative). Each 
case is the dual of the other so we only discuss the case of false positives. 

Two conditions must hold to retain a control rule with negative utility. First, the learning module 
must generate a rule with negative utility (a failure of the operationality criteria). Second the lale 
must have positive perceived utility (a failure of utility analysis). In terms of Equations la and lb, 
5 < Match_cost(r) and Initial_Savings(r)xSuccess_rate(r) > Match_cost(r). Tlie likelihood of the 
former depends on the effectiveness of the generation bias. The likelihood of the latter depends on 
the p.d.f for SAV. Problems with operationality criterion will be discussed m Section 7. Here we 
will consider tlie probability of a false-positive given that the generator produced a rule with nega- 
tive utility. 

If a generated rule has negative true utility, in can be mistakenly retained. Figure 4 illustrates the 
probability of a false-positive for two such control rules. The control rules have identical p.d.f.'s 
but different match costs. Si is the averagejavings times success rate for rule /. Q is the match cost 
for the rule. lUtilityl isdifference between Si and Ci. Control rule 2 has greater match cost and there- 
fore its utility is more negative than the utility of control rule 1. 




S S 



Figure 4: Probability of a false-positive for two control rules. 

To mistakenly retain a rule, the system must overestimate the average savings such that savings ap- 
pears greater than cost. As this estimate is based on a single obsen'ation drawn from the p.d.f, the 
chance of obtaining a false positive is simply the probability mass to the right of the average cost 
(the shaded region of the p.d.f.). Notice that the probability of mis--classifying control rule 2 is much 
less than the probability of mis-classifymg control rule 1 . This p.d.f. has the desirable property that 
as the difference between savmgs and cost grows, the probability of mis-~classification diminishes 
In other words, when mistakes are made, they are likely to be small. A very different situation is 
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illustrated in Figure 5. In tills case the p.d.f lias a bi- modal distribution. This is an example of one 
class of p.d.f 's which can allow large mistakes to occur with high probability. 




lUtilityl 



Probability of a false-positive - J j[x)dx 




0 

s 



oo 



Figure 5: Probability of a false-positive given a bi-modal p.d.f. 

The previous discussion highlights the importance of bounded error. False positives may be accept- 
able if we ensure that the mistakes are small. We can ensure Equation 1 exhibits bounded enor if 
a combination of the following properties hold: 

1) small average match cost — in the worst case a rule will save nothing. Utility is then 0 - Match- 
cost(r). By bounding match cost we can guarantee small negative utility. 

2) savings has small variance — this reduces the likelihood of large discrepancies between esti- 
mated and actual savings. 

3) savings is normally distributed — this ensures that the likelihood of a false positive diminishes 
with the harmfulness of the rule. 

Some of these properties hold in the domains PRODIGY/EBL is tested on. For example, problems 
are generated by a procedure which randomly varies several problem parameters. These parameters 
exhibit little variance. In che STRIPS domain the number of blocks present in the world vary from 
two to five. In the scheduling domain the number of objects vai7 from two to four. The savings 
for many control rules learned in these domains vary with these parameters. Because the parameters 
do not vary, these control rules exhibit the small savings variance property. 

4 DOCUMENTING SINGLMULE MISTAKES 

The preceding section illustrates the shortcommings of the utility analysis described by Equation 
lb. In this section we illustrate a simple domain (Figure 6) which exhibits this problem. The domain 
theory will also be utilized in the experiment in Section 6. The domain is for a robot assembly task 
where the goal is to construct a component from its parts. All parts for a component are contained 
in a parts bin. If all the parts in the bin are free of defects, the component may be assembled. Other- 
wise another bin must be found. 

When PPvODIGY/EBL is given a problem in this domain, it considers multiple instantiations of the 
INSPECT-BIN operator — one for each bin in the initial state. If the first bin contains a defect, it 
produces the control rule in Figure 7. As future problems are solved, this rule avoids instantiations 
of INSPECT-BIN which lead to failure. As in RULE-I above, the savings provided by this mle de- 
pends on information not mentioned in the rule. The savings increases as we increase the number 



EMC 



6 



ASSEMBLER-COMPONENTS 



PRECONDITIONS: 

3 ?BIN : parts-bin(?BIN) 
defect-free-^omponents(?BIN) 

ADD: 
assembly--completeO 



INSPECT BIN 



PRECONDITIONS: 

V 7PART : in-.bin(?PART ?BIN) 
good(?BIN 7PART) 
ADD: 

defect-free~components(? BIN) 



Figure 6: A simple assembly domain 

of pails per bin. If bin size exhibits a large variance then the small savings variance property will 
be violated. We exploit this property to demonstrate single-rule failures of Equation lb. 



RULE-2: 



IF current-node is ?node 

current-goal at ?node is assembly-complete() 
current-operator at ?node is INSPECT-BIN 
candidate-bindings at ?node is (?bin) 
V ?part : m-bin(?part ?bin) 

good(?bin ?part) 
candidate-bindings at ?node is (?other-bin) 

THEN prefer (?bin) to (?other-bin) 



Figure 7: A control rule from the assembly domain 

It is not immediately apparent why RULE-2 would be conjectured. The control rule examines all 
contents of a bin to decide if the planner should examine all contents of a bin. In fact, this rule reduces 
planning time in many cases. The rule avoids the overhead of generating a problem space (generat- 
ing intermediate nodes, searching the domain theory for relevant operators, etc.). However, the po- 
tential effectiveness of the control rule is irrelevant to this discussion. The importance of utility anal- 
ysis is that permits harmful rules to be generated. What we demonstrate in this section is that 
PRODIGY/EBL fails in this task. The reasons for this failure are independent of the actual form of 
the control rule. 

4»1 Methodology 

We violate the small savings variance property by creating a problem distribution which varies bin 
size bi-modally. We accomplish this with two classes of problems. In each problem class the first 
bin contains defects. This forces the planner to backti*ack and, consequently, to produce RULE-2. 
Equation lb credits the rule with an average savings commensurate with the number of parts in this 
bin. The ftrst class contains problems with fifty bins of two parts each. The second class contams 
problems with two bins of two hundred pai'ts each. If PRODIGY/EBL learns the rule on a problem 
from the first class, it should have little perceived savings. If it learns the rule on a problem from 
the second class, it should have high perceived savings. 

Problems are randomly generated, half from the first class and half from the second. Using this dis- 
tribution, we train PRODIGY/EBL following the methodology outlined in Mmton 's thesis [Minton88 
pp. 117-118]. We present the system with 100 training problems followed by a "setding phase'* of 
25 problems. The settling phase is required so that control rules learned at the end of the training 
phase can undergo utility analysis. This regimen is repeated for ten independent trials with different 
problem sets (from the same distribution) on each trial. 
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AU trials were executed on an IBM RT 125 with 16MB of memory, using LUCID Common LISP 
and PRODIGY 2.0? 



4.2 Results 

Results are summarized in Table 1 . This reports the mean problem solving time across the ten trials. 
We computed a 95% confidence interval for each mean using a t-test. We also generated a learning 
curve, illustrated in Figure 1. Tlus is constructed with the same regimen but varying the size of the 
traming set. 



System Type 


Execution Time 
(100 problems) 


without learning 


346 ±9 CPU sec. 


with learning 


5839±98 CPU sec. 



Table 1: Empirical results from single-rule experiment 

RULE-2 produces a large performance degradation for problems from tlie first class (50 bins of 200 
parts each), and a moderate performance enhancement for problems of the second class (2 bins of 
200 parts each). The overall effect is a large performance degradation, the rule is learned on a 
problem from the first class, PRODIGY/EBL uniformly perceives the rule to produce little savings. 
In this case utility analysis correctly discards the rule. If learned from the second class, the system 
uniformly perceives the rule to have high savings. Thus, die rule is mistakenly retained. Discarded 
rules may be releamed, so eventually the mle is learned on a problem from the second class. 

The results indicate that learning substantially degrades problem solving performance. From this 
we can conclude that perceived utility can substantially diverge from true utility. Thus the utility 
analysis embodied by Equation lb can retain rules with high negative utility. 

The experiment also illustrates the potential to discard a good rule (false-negative). PRODIGY/EBL 
produces a small savings estimate for RULE-2 if it is learned on a problem with small bin size. This 
results in an underestimate of savings when the system solves problems with large bin size. In this 
the underestimate did not effect the performance of utility analysis because the rule has negative 
utility. However false negatives could result if the rule has positive utility. This could be achieved, 
for example, by increasing the likelihood of problems with high bin size. 

5 CRITIQUE OF PRODIGY/EBL: Multi--rule strategies 

As we have seen. Equation lb may misrepresent the utility of a control rule. This section illustrates 
that even with accurate savmgs estimates, this utility analysis can still produce undeskable results. 
The problem is that control rules may mteract such that the utility of multiple control rules cannot 
be predicted by simply knowing their utilities in isolation. This dependency is noted in Markovitch 's 
definition for the value of knowledge [Markovitch89 pp. 6-7] We call this property the composabil- 
ity problem. 

2. PRODIGY is available through Carnegie Mellon University. Contactprodigy@cs.cniu.edu. The domain 
theory and problem generators used in these experiments are available upon request from the authors. Contact 
gratch® cs.uiuc.edu.. 
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There ai'C many ways that the presence of one control rule can influence the utility of another. Two 
rules may avoid tlie same areas of the problem space. As there is no added benefit in ignoring an 
ai^ea twice, the utility of tlic mles together is not equivalent to the sum of ± :ir individual utilities. 
A subtle example occurs when a control rule has different match costs in different portions of the 
problem space» A second control mle which removes portions of this problem space may substan- 
tially change the average vp ch cost of the first mle. 

A particular interaction between two control rules is illustrated in Figure 8. This shows a hypotheti- 
cal problem space of fifteen nodes. Supposed r and 5 are two control mles which prune the nodes 
in sets R and S respectively when compared to problem-solving with no control rules. IRI is the num- 
ber of nodes trimmed by rule r. ISI is similarly defined. When used m isolation, rule r is checked 
six times (i.e. 15 - IRI). It successfully applies twice: at node 2 saving nodes 3-8 and at node 9 saving 
nodes 10-12. Rule ^ is checked eight times (i.e. 15 -ISI) and succeeds at node 1, saving nodes 9-15. 
.-vssume the average match cost of r is M^, the average match cost of s is M,, and the average cost 
to expand a node is g. 

I I R-S «- nodes saved only by mle r 
I - I S-R * nodes saved only by mle s 
I n RnS = nodes saved by both rules 
r and s 

Mr Average match cost of rule r 
Ms « Average match cost of rule s 
g " Average cost to expand a node 

Utility(X) is the utility of a set of control rules. The interaction between two rules is the amount to 
which their utilities are not additive: 

Residue « Utility({r, s}) - [Utility({r» + Utility(0»] « IR-SlxM, + IS-RlxM^ - IRnSIx^ (2) 
The residue in Equation 2 is the amount by which the utilities of r and s are not composable. The 
rules combine synergistically if this value is positive. If negative, they engage in a harmful interac- 
tion. Two rules with positive utility can potentially combine to yield a strategy worse than neither. 

Interactions force us to discard the notion of rule utDity as defined in Equation 1 . Instead, we propose 
conditional utility to capture the benefit of a contvol rule. The conditional utility of a rule is the 
change in performance arule provides when added to an existing set of rules. Thus, for the example 
in Figure 8, Utility(5lr) is the utility of addmg rule ^ to an existing strategy of mle r alone. More 
generally, for any two sets of rules X and Y: 

Utility(XuYI0) - Utility(XI0) + Utility(YIX) (3) 

where 0 is the empty strategy. For a rule r, the utility of r in Equation 1 is equivalent to its conditional 
utility with respect to tlie empty set of rules: Utility(r) = Utility(rl0). 

Ignoring these interactions can lead to degraded performance. As the savings estimate is fixed at 
leaming time, the perceived savings can diverge from the true average. The directly measured pa- 
rameters(Success-rate and Match-cost) are also impacted by this property. PRODIGY/EBL bases 
these parameters on the average of many observations. For an average to be meaningful the observa- 




Figure 8: example of interacting rules 
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tions must be drawn from the same distribution. However, as control rules are acquired or discarde d, 
the distribution can change. As a result, these parameters have questionable semantics. 

6 DOCUMENTING MULTI-RULE MISTAKES 

The preceding section suggests another way that the utility analysis described by Equation lb can 
fail. In this section we illustrate a simple control rule interaction which degrades performance in 
our assembly domain. Again we use two problem classes, each of which has equal representation. 
The first class has forty bins of 20 parts each, most of which have defects. The second class has forty 
bins of 20 parts each, all of which have defects. 

The first problem class results in the generation of RULE-2 from Figure 7, In combination with the 
second cIa^^- of problems, this rule has high negative utility and, after a few subsequent problems, 
utility a». ^ysis correctly discards it (notice that bin size does not vary in these problem classes). The 
second problem class results in the generation of RULE-3 in Figure 9. This rule checks every part 
in every bin, searching for a defect free bin. If it does not find such a bin, it terminates problem solv- 
ing. This rule has high negative utility and is quickly discarded by utility analysis. 

RULE-3: jp candidat€-node is ?node 

is-top-level-goal assembly-completcQ 
V ?bin : is-bin(?bin) 
->defect-free-components(?bin) 
3 ?part : is~part(?part ?bin) 
-igood(?bin ?part) 
THEN reject ?node 

Figure 9: A control rule from the assembly domain 

A different situation arises if the system leams RULE-3 before it discards RULE-2. RULE-2 is ex- 
pensive to match on problems from the second class, and it provides no savings (there is no defect- 
free bin to prefer). As a result, the problem takes much longer to solve. This greater problem solving 
time is reflected in a greater savings estimate for RULE-3. With this estimate, RULE-3 is retained. 
If RULE~2 remained in the system, this estimate would accurately reflect the savings for RULE-3. 
However, as RULE-2 has negative utility, utUity analysis eventually discards it. When it is dis- 
carded, the estimate is not updated and RULE-3 is mistakenly retained. 

We tested this domain using the same methodology as in section 4. It is possible that the degradation 
could arise through factors other tlian the composability problem. We control for this situation by 
introducing another test condition. Our analysis indicates that RULE-3 is retained through an inter- 
action with RULE-2. If this analysis is correct, RULE-3 should not be learned if RULE-2 is never 
learned. The new test condition prevents the learning of RULE-2. 

Table 2 summarizes the results. PRODIGY/EBL learned the control strategy containing RULE-3. 
This degraded performance by a factor of three. When RULE-2 is suppressed, no rule is acquired, 
yielding results equivalent to the condition without learning^. This confirms that PRODIGY/EBL 
retains RULE-3 through a control rule interaction. 

7 OPERATIONALITY CRITERION 

In this section we argue that the limitations in PRODIGY/EBL's utility analysis translate into general 

problems for speed-up learning. We illustrate this by considering the alternative argument. PRODI- 

3, As the control condition acquired no control rules, the same timing data is reported for the no learning and the 
ccxitrol conditions. 
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System Type 


Execution Time 
(100 problems) 


without learning 


2292 ±4 CPU sec. 


with learning 


7436±81 CPU sec. 


without RULE-2 


2292±4CPU sec. 



Table 2: Empirical results from multi-rule experiment 

GY/EBL can acquire harmful knowledge. But this is a reflection of two failures. First the knowledge 
must be mistakenly generated and then it must be mistakenly retained. We have only demonstrated 
the latter failure. A better rule generator would avoid the former. In fact, a perfect rule generator 
would obviate the need for utility analysis. Much of the research m speed-up learning investigates 
alternate criteria for generating knowledge. 

ITie composability problem raises a serious obstacle to this argument. An operational ity criteria is 
designed to prevent the generation of harmful rules. However, the existence of rule interactions calls 
into questions the the notion of a harmful rule. A control rule which is harmful in one context may 
result in improved performance in a different context. Most criteria ignore these mteractions (e.g., 
[Etzioni90a, Letovsky90, Mitchell86, Segre87, Subramanian90, Tambe89, Yamada89]). Further- 
more, reasoning about interactions can be costly. A set of / control rules yields 1} distinct control 
strategies (the power set of the / mles). In the worst case we must consider all these alternatives. 

The vaiiancein savmgs also raises difficulties. Most current criteria ignore distribution information. 
For example, the nonrecursive hypothesis [Etzioni90b] states that explanation-based learning "is 
effective when it is able to curtail search via nonrecursive explanations." A recursive explanation 
contains assertions which depend on instances of the same assertion. An example is where a sorted 
list is explained b} explaining how sublists are sorted, lliis hypothesis claims that beneficial rules 
can be identified by their syntactic structure alone. A similar claim is stated in [Letovsky90, Subra- 
manian90] in the context of macro-operators. 

The harmful control rules learned in our experiments are nonrecursive by the definitions in [Etzio- 
ni90a, Letovsky90, Subramanian90] which directly contradicts the nonrecursive hypothesis. These 
experiments solidly demonstrate that utility varies across problems. From this we must conclude 
that utility for a rule depends on the problem distribution. For example, RULE-2 enhances perform- 
ance if we limit tlie distribution to problems with large bin size. Criteria which ignore distribution 
information are insufficient. Furthermore, it is difficult to obtain this hiformation. A system must 
know more that the distribution of problems. It must know the distribution of rule applications botli 
within and across these problems. It must also know how features of tliese rule application impact 
utility. This is especially difficult as these features may not appear in the body of the control rule 
(e.g., the utility for RULE-2 varies with the size of the bin). 
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Researches have not addressed these limitations, in part, because of the historical development of 
the field. Speed-up Learning techniques evolved from earlier work in concept learning. A concept 
learning system must learn classification rules to identify some target concept accurately. Thus, the 
focus was oi techniques which produced accurate generalizations of examples. In the context of 
speed-up learning these techniques can accurately generalize the conditions for applying a control 
decision. We have innerited this focus on accuracy. However, in speed-up learning, accuracy is no 
longer the primary issue. Instead a system must balance accuracy with efficiency [Keller87]. Accu- 
racy is only tenuously related to efficiency. For example, RULE-2 accurately predicts when a bin 
of arbitrary size will succeed. It benefits the system if generalized to problems with large bin size. 
However, its effects are disastrous when applied to the full range of sizes. But in each case the rule 
is accurate. 

The weak link between accuracy and efficiency is observed in other systems as well. For example, 
Carlson, Weinberg, and Fisher [Carlson90] learn strategies with a probabilistic concept hierarchy. 
This approach accurately eliminates fruitless alternatives, but it produces strategies with worse ex- 
ecution time (negative utility). A similar effect is observed in DAEDALUS, a case-based planner 
which incorporates macro-operators into a probabilistic concept hierarchy [Allen90], 

8 PERFORMANCE ELEMENT 

Different problem solvers implement different search mechanisms. One w?)y to view this is to say 
that a problem solver implements a body of default control knowledge. From this perspective, the 
composability problem suggests that the same learned control knowledge should have different util- 
ity when used with different problem solvers. Indeed, Mooney demonstrated this in several experi- 
ments [Mooney89]. He shows that macro-operators have very different effects when used with a 
depth-first planner or a breadth-first planner. 

9 COMPOSER 

Utility is a complex function of the problem solver, the structure of the domain theory, a possibly 
unknown problem distribution, and other learned knowledge. In this section we introduce a statisti- 
cal approach to utility analysis, called COMPOSER, which addresses these issues. The technique 
is implemented in conjunction with PRODIGY/EBL and takes the place of the utility analysis of 
Equation lb. \ 

Equation 3 suggests ^ simple hill climbmg approach for avoiding interactions. If a control rule has 
positive conditional utility with respect to a control strategy X, adding the control rule to X must 
result in a more effective strategy. The greedy technique begins with X initialized to the empty set 
and incrementally adds to X a control rule witli the highest estimated conditional utility with respect 
to X. This cycle continues until no rule remains with positive conditional utility. In this way the 
problem of fmding an effective control strategy is reduced to the problem of finding a control rule 
with positive conditional utility. 

PRODIGY/EBL misrepresents the utility of a single control rule because it is restricted to a single 
observation of the rule's savings. We avoid this limitation by using many observations. These obser- 
vations are combined to derive a mean utility and a confidence interval on that mean. Observations 
are made with respect to the current control sti^ategy, and the method allows multiple rules to be eva- 
luated simultaneously. 

The COMPOSER approach works in conjunction with an existing planner and control rule generating 
system. Our implementation is built on top of the PRODIGY/EBL system but it can be readily 
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adapted to work with alternative rule conjecturing schemes. PRODIGY/EBL is provided with several 
control rule classes. Our implementation currently implements only a subset of these classes. We 
implement rejection rules and selection rules which unequivocally remove alternatives. Preference 
rules are not implemented but we anticipate little difficulty in extending the approach to this class. 

9.1 Gathering Observations 

Learning proceedG much as in PRODIGY/EBL. The planner generates solutions and a problem solv- 
ing trace. As in PRODIGY/EBL, the trace includes the resources spent at each node, including time 
spent evaluating control rules. The PRODIGY/EBL learning module analyzes this trace and conjec- 
tures control rules. However, mstead of directly adding these rules to the current control strategy, 
they are placed on a list of pending rules. Pending rules are allowed to match against the current 
planner state and the match cost recorded. However the actions of pending rules are not performed. 
Rather the system aimotates the problem trace with the choices it would have eliminated. After a 
problem run is complete, the cost of each subtree which would have been pruned can be determmed. 
If the control rule is checked but does not apply, it is credited with zero savings. 

We can illustrate this with an example. Recall RULE^l m Figure 2 and the blocksworld search space, 
reproduced in Figure 10. If RUL&-1 is on the pending list, it is consulted as the planner explores 
(generates) the problem space. In tliis case example the rule applies at node Nl. If the rule was 
allowed to apply it would eliminate the first six alternatives. As the rule is on the pending list, tliese 
alternatives are not eliminated. Instead a marker is placed on each link. After problem solving is 
complete, these markers are identified and the resources expended in the subtree below the marker 
are i-ecorded. This total is the potential savings for the particular rule application associated with 
that marker. 



GOAL: 



NliCclear B) 




PUTDOWN(A) - 
PUTDOWN(B) - 
STACK(A,A) . 
STACK(A3) - 
STACK(B,A) 
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4N^:(T3olding"S:^t ^ 
4N3:CholdiDgB)l 
jN4:rholdiDgA)^ ^ 
iN5:(clearBn ^ 
4 N6:rboldiDglB)] ^^ 

jN8:ronA A^ 
" fN9: success | 
4NlQ:(onB a H^^^ 
■ |Nll:(onBB"n 



Marker indicating a successful 
application of RULE-1 



INITIAL STATE: 







/■ y ./■ / y 





Figure 10: COMPOSER analyzing a search space for the blocksworld domain 

There are three additional points. First, the system must not attempt a pending control rule in a por- 
tion of the space which would have been trimmed by that rule. For example, RULE---1 , if activated, 
trims the first six alternatives of Nl. Since the rule is pending, these alternatives are explored. It 
is possible that the rule applies in other nodes within these alternatives. For example the rule poten- 
tially applies at nodes N5 and N7. These must not be considered as valid applications because they 
would never have been reached if the control rule was activated. 
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Second, we require that pending rules do not effect the behavior of the planner. However, the planner 
expends resources to evaluate the preconditions of pending rules. If the plannerb^^s resource bounds 
(as in PRODIGY), these bounds must be insensitive to this additional cost. The maich cost of pending 
rules must also be discounted when summing the resources expended below an application marker. 

Finally, all of the observations are contingent on a particular control strategy. All observations must 
be discarded each time a new control rule is added to the current strategy. To see why contexts effects 
are important, imagine that a rule which eliminates the STACK alternatives is selected as the next 
active rule. Before tliis addition, RULE^l saved all six unsuccessful alternatives in the example. 
With this new rule, RULE^l only avoids the two PUTDOWN alternatives. If we did not discard the 
old observations, the estimate would be skewed by these higher savings observations. 

9.2 Estimating Utility 

We now have a mechanism for gathering observations of conditional utility. To identify the pending 
rule with the highest conditional utility we must compute an average conditional utility for a control 
rule. We must place a confidence bound on this mean. Only control rules which have positive utility 
witii high confidence will be added to the current strategy. Deriving a bound is difficult because 
utility varies witiiin any given problem. Furthermore, diff erentproblems will have different patterns 
of variance. For example, one problem may be dominated by control rule applications which have 
positive utility, while another problem may be dominated witii application of negative utility. The 
fmal mean must reflect the composite of these individual distributions. Standard statistical ap- 
proaches require sampling utility randomly from any place vvitiiui any problem* Unfortunately the 
constraints of problem solving force us to sample at tiie level of complete problems. This means 
that our observations will consist of all the rule applications in one randomly selected problem, fol- 
lowed by all the rule applications of the next randomly selected problem, etc. We describe a statisti- 
cal technique known as cluster sampling which is designed for this task. Our description of this tech- 
nique is derived from tiie presentation m [Kish65 pp. 148-216]. 

The basic problem is that both tiie total utility (i.e., the numerator of the sample mean) and the num- 
ber of rule applications (i.e., the denominator of the sample mean) are random variables. Because 
we are sampling problems randomly from a population of problems and the number of control rule 
applications within problems in not constant across problems, tiie sample size is random. The sam- 
pling plan can be tiiought of as a cluster sample with problems representing tiie clusters and rule 
applications representing observations within the clusters. Rule applications appear in tiie sample 
because their problem was selected (i.e., problems are the primary sampling unit). 

We furst introduce some notation. These defmitions are with respect to a particular rule, R. 

Uy is tiie utility of the ith rule application of R within the jth problem. Utility is the savings 
resulting from that application minus the match cost for tiiat application 

X; is the number of applications of rule R m problem (cluster) j 

Uj - Yi Uy, (/ - 1, Xj), is tiie sum of the utilities for each application of rule R in problem j. 
TTiis is also called tiie sample total for cluster j. 

u - Xj Uj, 0* " 1 > ci\ is tiie sum of tiie sample totals for each problem where there are a 

problems selected from a population of size A 
X « X; Xj, (j - 1, a), is the total number of applications of rule R in the selected problems. 

Then the average utility of R over its applications is 
r « u/x - (l/x)Xj Uj « (Xj Uj ) / (Xj Xj ) 
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Note that r is also a weighted mean of the problem means: 
r-u/x 

- [x/(x/a)]y ; where y ; « y/xj 

Thus, the average utility of rule R for each problem is weighted by the specific number of rule appli- 
cations in that problem relative to the average number of rule applications across all problems. 

Given these relations and definitions, the variance of the average utility for rule R is found to be: 

Var(r) - (l/x2)[Var(u) + Var(x) 2 r Cov(u,x)] 
where Var( ) indicates variance of the variable in ( ) and Cov( ) indicates the covariance between 
the two variables listed in (). The variables are as previously defined. 

This formula is approximate and is reliable only if se(x)/x < .20 wheie se(x) is the standard error of 

X. 

There are many equivalent expressions for the variance. We will utilize the following expression. 
A derivation can be found in [Kish65 p. 189, Equation 6.3.6]:"^ 

Var(r) - 1/ [a(a - 1)] { I; [(x; a/x)(y^ ^ r)]^} 

That is, the squared deviations of the problem means from the grand mean (y) - r)^are weighted 
by the relative sample sizes in the clusters (Xj/(x/a). 

9.3 Putting it Together 

With cluster samplmg we can combine the observations of conditional utility for a particular control 
rule into a meaningful average and a bound on that average. An average and bound is maintained 
for each rule on the pending list. After each problem solving attempt, COMPOSER updates the statis- 
tics for pending rules and then considers incorporating a control rule into the current strategy. A 
control rule is only considered for inclusion if it has positive utility within a confidence interval of 
5. For our current implementation 5 is set arbitrarily at 95%. After each problem is executed, the 
system checks if any rules satisfy the confidence requnement. If so, COMPOSER adds the rule with 
highest positive to the current strategy, and removes this rule from the pending list. Statistics for 
the remaining pending rules are discarded as they are meaningless in the context of the resulting con- 
trol strategy. The same method identifies rules with negative utility. If a control rule has negative 
utility with confidence 5, it is eliminated from the pending list. This operation does not affect the 
current strategy, so the statistics associated with the remaining pending rules are left unchanged. 
This cycle is repeated until the training set is exhausted. 

9.4 Evaluating COMPOSER 

Necessarily, adding a rule of positive conditional utility will increase the efficacy of the composite 
strategy. The COMPOSER technique could fail, however, if conditional utility is not property esti- 
mated. To test this possibility, we instigated a series of experiments which are summarized in Figure 
1 1 . These graphs illustrate learning curves where the independent measure is the number of random 
training examples and the dependent measure is execution time for 100 test problems. The method- 
ology is identical to that described in Section 4. As COMPOSER does not implement preference 

4. Kish m ultiplies this equation by a faction (l-f) where/is the probability of selecting a particular rule application, 
which is the sanie fac all applications. Specif ically,/= f^fb where jj, is the ratio oS a/ A (the number of problems selected 
relative tothe size of the population of available problems), and Jy is the fbced probability ctf selecting a rule application 
within a problem. Inourcasewe will use all ruleapplications associated with a problem. Consequently, jj, « 1.0. In 
addition we will assume the general case where there are an infinite number of problems. In this case a/ A approaches 
zero. Tighter bounds can be achieved if A is finite. 
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rules, differences in performance could be attributed to this difference, rather than the improved util- 
ity analysis. To control for this effect we tested two versions of PRODIGY/EBL: the default version 
and a version which cannot learn preference rules. Results are presented for the most effective of 
these two systems, which in each case was the system without preference rules. Ail problem genera- 
tors were supplied with the PRODIGY 2.0 system. More effective strategies have lower solution 
tunes. Ideally Composer should be compared agah3St the optimum control strategy but it is compu- 
tationally mfeasible to do this. Instead we provide PRODIGY without learning and PRODIGY/EBL 
as benchmarks. The systems are tested on two domains from [Minton88] and the domam in [Etzio- 
ni90a] for which PRODIGY/EBL produced harmful strategies. The system could not be tested on 
the two domains reported here as these involve preference rules which have not been implemented 
in COMPOSER. However, in similar domams which did not involve the learning of preference rules, 
COMPOSER accurately avoided learning harmful control rules. 




120 30 40 50 60 70 80 90 100 

# of trainiDg examples 



10 20 30 40 50 « 70 80 90 100 

# of training examples 



10 20 30 40 50 60 70 80 90 100 

# of training examples 



_ No learning 

COMPOSER 

F.IODIGY/EBL 



DOMAIN 


COMPOSER 


PRODIGY/EBL 


No Learning 




Rules Learned 


Solution Time 


Rules Learned 


Solution Time 


Solution Time 


A 


2 


177 sec. 


14 


238 sec. 


390 sec. 


B 


4 


344 sec. 


23 


724 sec. 


2436 sec. 


C 


1 


178 sec. 


9 


293 sec. 


229 sec. 



Figure 5. Summary empirical results 

The results illustrates several interesting features. On all domains COMPOSER exceeded the per- 
formance of PRODIGY/EBL. An important result is that the execution times associated with Compos- 
er are monotonically decreasing. This suggests that conditional utility is accurately estimated. A 
surprising fact is that in the domains where COMPOSER acquired a strategy, only one or two control 
rules account for most, if not all, of the savings. This indicates that most of the rules acquired by 
PRODIGY/EBL are, at best, superfluous. 

9.5 Limitations and Extensions 

The COMPOSER technique corrects limitations of previous utility estimation techniques, but this 
guarantee may come at a considerable cost. Rules of high utility may require many examples before 
reaching an acceptable level of confidence. More importantly, the estimation of utility requires that 
rule preconditions be evaluated many times withm a trainmg problem, and each precondition match 
requires an expense of resources. If the number or size of rules mactive rules grows too large, train- 
ing problems may take prohibitively long to solve. Resolution of this issue requires making conser- 
vative choices for which rules to consider, quickly discarding bad choices, and relaxing some guar- 
antees. A number of these approaches are discussed below. 
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One interesting extension involves exploring the use of an approximate domain theory forjudging 
utility to drive the EBL component Cuirently, the EBL component does not use a theory of utUity 
at all. Instead it exploits only a theory of rule correctness. Since the hybrid method relaxes the need 
for a complete and correct domain theory, an approximate theory may be possible. 

Alternatively, or in addition, a correct but incomplete utility domain theory might be entertamed. 
It may be possible, for example, to fashion a theory that recognize sub-cases in which two rules, 
R and S, have identical effects but R has more general preconditions. From this information we can 
conclude that that R subsumes the savmgs of S and that R and S should never appear together in the 
same strategy. If we can further state that the match cost of R is less than the match cost of S then 
R dominates S. Any strategy containing R is guaranteed to have higher utUity than a strategy con- 
taining 

The empirical component estimates conditional utility for a rule across all problems in the distribu- 
tion. In practice, rule utility varies systematically across different problems. For example the sav- 
ings of the rule in Figure 2 is a function of the number of blocks in the initial state. If problems can 
be classified based on features which effect rule utility, tighter utUity bounds may be achieved. This 
extension would allow a flexible control strategy which utiUze different rules for different problem 
classes. 

Another important consideration is that the greedy reductionist algorithm is a hill-climbing tech- 
nique and thus, while guaranteeing improvement, may terminate with a non-optimal strategy. It is 
also conceivable that no strategy will be found when beneficial strategies do, in fact, exist. It can 
happen that all rules have individual negative conditional utilities but combine synergistically to 
produce a good strategy, confoundmg the greedy approach. The method for combining rules can 
be viewed as a strong bias on the space of possible control strategies- The appropriateness of this 
bias needs further investigation. 

Finally it is useful to consider when simplifications of this technique are sufficient to produce posi- 
tive strategies. This is an important consideration because the guarantees provided by COMPOSER 
may come a considerable cost in increased learning time. For example, the PRODIGY/EBL system 
[MintonSS] does not address tlie composability problem and yet has demonstrated success on a num- 
ber of domains. Equation 2 indicates that in the case where control rules are nearly independent, 
conditional utility can be approximated by a measure which is independent of the current strategy. 

10 RELATED WORK 

COMPOSER is one approach to the utility generalization and composability problems. In this sec- 
tion we describe other work which addresses these issues. We have organized the presentation into 
four basic trends. 

9.1 Elaboration 

The problems with PRODIGY/EBL arise from its simplified cost model A natural approach is to 
elaborate the model. COMPOSER is one such elaboration. Leckie andZukerman describe another 
approach [Leclde91], They present an inductive system which reasons about some control rule in- 
teractions. They defme a global cost model which is a function of a finite set of p yssible control 
rules. The problem of which rules to keep is reduced to the problem of mmimizing this function. 
The model makes several assumptions. For example, all rules are considered to have the same con- 
stant match cost. Even so, the model must entertain all X alternative combinations of i control rules. 
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9.2 SimpUgcation 

Complete models of utility appear intractable. An alternative is to introduce deliberate simplifica- 
tions into a complete model. Any simplification will paitition domains intotwosets: thoseforwhich 
the simplification is appropriate, and those for which it is not. For example, Equation lb is a simpli- 
fied model which operates correctly in the domains reported in [Mmton88] and incorrectly in the 
domains reported here. Given a simplification, we must formalize the qualities of a domain which 
affect its classification. This allows potential users to decide if the ^ool fits their ^ "oblcm. This paper 
can be viewed as a preliminary attempt to formalize the properties of domains with respect to Equa- 
tion lb. Formal treatments of the macro-operator approach appear in [Greiner89, Korf87, Tadepal- 
U91]. 

The work of Oren Etzioni is similar in spirit [Etzioni90a]. Etzioni looked extensively into to control 
rules which were rejected by PRODIGY/EBL's utility analysis. He then identified a property of these 
rules which seemed to hold across multiple domains. He captured this in a syntactic criteria — the 
nonrecursive hypothesis. This transfers an aspect of utility analysis mto the rule generator. We have 
demonstrated in this paper that the nonrecursive criteria is a simplification. The next step is then 
to identify the domain constraints which influence the accuracy of this method. 

Admittedly there are many possible simplifications, many of which create useless distinctions. An 
alternative approach is to identify a set of "natural" domains and design simplifications appropriate 
to them. Unfortunately there is little consensus on the extent of this set. Leaving these problems 
aside, regularities in these domains can suggest simplifications to a complete mode of utility. We 
are not aware of any research in this area. 

9.3 Specificity 

One of the primary reasons for the utility generalization andcomposability problems is that a control 
strategy is required to improve performance over an entire set of problems. Thus, knowledge ac- 
quired during one problem can affect performance on every other problem solved. A beneficial iiile 
may well decrease performance on some problems as long as it m^es up for this in other enhance- 
ments. The resulting tradeoffs can be quite complex. 

This need for a global performance improvement exacerbates the utility generalization problem. 
A control rule may have to make recommendations about vastly different problems . Thus its savings 
can be expected to have wide variance as well. This global property also insures that many irrelevant 
rules will be entertained while solving a particular problem, increasing the opportunity for interac- 
tions. I 

A natural alternative is to be conservative about rule use; do not generalizing a control rule to apply 
at every legal opportunity. For example, a system could only consider a control rule if it was learned 
on a problem which is "similar" to the current problem being solved. This approach is taken by Fish- 
er and Yoo [Fisher91] where problem classification rules to control search. There is also psychologi- 
cal evidence that humans perform limited generalization in the context of problem solving [Me- 
din89]. They explain this effect in terms of a case-based reasoning model. 

9.4 Theones of Utility 

As we mentioned, speed-up learning techniques have focused on generalization techniques which 
preserve the accuracy of control decisions. Utility considerations have been patched on as a filter 
to traditional generalization techniques. An alternative is to identify utility preserving generaliza- 
tions. For example, PRODIG Y/EBL con structs control rules based on a theory of rule accuracy. An 
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alternative would be to generate rules from a theory of rule utility. Hirsh suggested such an approach 
in [Hiish87]. 

7 CONCLUSION 

The PRODIGY/EBL system [Mmton88] was one of the first works to directly attack the problem of 
strategy utility. The problem of finding effective strategies was i-educed to the problem of finding 
effective rules. However, this paper illustrates limitations of the approach. There are two basic diffi- 
culties. The first aiises from the fact that the utility of a control rule cannot be accurately detennined 
from a single instance of the rule. This is a manifestation of a more basic problem which we term 
the utility generalization problem. The diEficulty is that the generalization techniques employed by 
speed-up learning systems are accuracy preserving but not utility preserving. 

The second difficulty is that control rules interact such that the utility of one control mle is a function 
of the other control rules in the system. This composability problem means that systems cannot re- 
duce the problem of learning effective strategies to the problem of identifying mle utility in isolation. 

We documented the seriousness of these problems with an example domain theory. With this theory, 
PRODIGY/EBL generated control strategies which were up to seventeen times slower than the origi- 
nal planner. While this raises serious questions about the effectiveness of PRODIGY/EBL, we also 
claim the the utility generalization and composability problems are basic issues which are not ade- 
quately addressed by current speed-up learning techniques. 

Finally, we introduced an altemative technique called COMPOSER. This system is based on a sound 
statistical model which is validated with a series of experiments. COMPOSER successfully avoids 
the utility generalization and composability problems. However, the technique may result in sub- 
stantially higher learning cost. Our future research seeks to reduce this learning cost by identifying 
acceptable simplifications to the complete model. 
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